Annotation of Clinical Narratives in Bulgarian language

نویسندگان

Ivajlo Radev

Kiril Ivanov Simov

Galia Angelova

Svetla Boytcheva

چکیده

In this paper we describe annotation process of clinical texts with morphosyntactic and semantic information. The corpus contains 1,300 discharge letters in Bulgarian language for patients with Endocrinology and Metabolic disorders. The annotated corpus will be used as a Gold standard for information extraction evaluation of test corpus of 6,200 discharge letters. The annotation is performed within Clark system — an XML Based System for Corpora Development. It provides mechanism for semi-automatic annotation. First a pipeline for Bulgarian morphosyntactic annotation and a cascaded regular grammar for semantic annotation are run, then rules for cleaning of frequent errors are applied. At the end the obtained result is manually checked. Our goal is to adapt the morphosyntactic tagger to the domain of clinical narratives as well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Association Rules from Clinical Narratives

We propose a method that processes raw informal medical texts (from health forums) and formal texts (outpatient records) in Bulgarian language in order to extract typical word co-occurrences in the form of association rules. When mining these rules we use some context information and small terminological lexicons to generalize the extracted frequent patterns. This allows to study informal expre...

متن کامل

Anaphora - Clause Annotation and Alignment Tool

The paper presents Anaphora – an OS and language independent tool for clause annotation and alignment, developed at the Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences. The tool supports automated sentence splitting and alignment and modes for manual monolingual annotation and multilingual alignment of sentences and clauses. Anaphora has ...

متن کامل

Design of an extensive information representation scheme for clinical narratives

BACKGROUND Knowledge representation frameworks are essential to the understanding of complex biomedical processes, and to the analysis of biomedical texts that describe them. Combined with natural language processing (NLP), they have the potential to contribute to retrospective studies by unlocking important phenotyping information contained in the narrative content of electronic health records...

متن کامل

Bulgarian Language Resources for Ontology-Based Semantic Search

This paper presents the language resources, which would facilitate the ontology-based semantic search. Some of these resources are language independent, such as the domain ontology. Some depend on the specific language: terminological lexicons, annotation grammars, sense disambiguation rules, gold standard corpus. Here we focus on the Bulgarian resources constructed in two domains for supportin...

متن کامل

Bulgarian X-language Parallel Corpus

The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Annotation of Clinical Narratives in Bulgarian language

نویسندگان

چکیده

منابع مشابه

Mining Association Rules from Clinical Narratives

Anaphora - Clause Annotation and Alignment Tool

Design of an extensive information representation scheme for clinical narratives

Bulgarian Language Resources for Ontology-Based Semantic Search

Bulgarian X-language Parallel Corpus

عنوان ژورنال:

اشتراک گذاری